Image recognition apparatus and image recognition program

ABSTRACT

An image recognition apparatus is configured to recognize movements of players in a sport match or game from contents recording a sport match or game wherein the players match against each other between domains partitioned with such an obstacle as net, the image recognition apparatus including: a picture information obtaining section configured to obtain picture information containing an image of a movement of at least one of the players from the contents; a sound information obtaining section  103  configured to obtain sound information generated in synchronism with the picture information from the contents, the sound information including information on a hitting sound generated upon hitting of such an instrument as a ball moving between the domains; a hitting time information specifying section  105  configured to specify a hitting time at which the instrument is hit based on the sound information; a rule information storage section  102  configured to store rule information for carrying out the sport match or game; and an image substance recognizing section  106  configured to recognize a substance of an image containing the image of the movement of the player provided by the picture information based on the picture information, a position of the instrument at the specified hitting time and the rule information.

TECHNICAL FIELD

The present invention relates to an image recognition apparatus capableof advantageously recognizing even the substance of an image included insports-related contents, such as a sport program telecasted, which hasbeen conventionally difficult to recognize.

BACKGROUND ART

With the growth of the Internet society in recent years, computerequipment, communications environment and interfaces have become capableof operating at higher speeds in broader bands and, hence, the amount ofuser-accessible digital picture information is increasing steadily invarious fields; for example, various types of picture data are beingaccumulated in large amount here and there. Increasing importance hasbeen attached to the art of accessing such massive amounts ofinformation and quickly searching for a desired portion of a picture.

For a user to extract a user's desired image from a scene of a sportpicture of, for example, tennis, methods of recognizing the substance ofan image, such as “successful passing shot” and “successful smash”, areconceivable to be adopted. Such methods include methods of recognizingthe substance of such an image by manually inputting a “successfulpassing shot” section, a “successful smash” section and a like sectionof picture information one by one, or by extracting positions ofrespective of a ball, players and court lines and totally judging achange with time in spatial correlations among the extracted positionswith use of a computer.

The method of image recognition based on manual input, however, involvesa problem of increased labor costs and a problem of heavy burden on theoperators which arises when the contents processing takes a long time,though the substance of an image can be reliably recognized. On theother hand, the method of automatic image recognition with a computerhas such an inconvenience that if picture information is the onlysubject for processing, a failure occurs to trace a ball when the ballis overlapped or hidden by a player, net or the like, so that animportant position and time cannot be specified in a portion of pictureinformation, thus resulting in a failure to detect an event to berecognized or in erroneous image recognition.

DISCLOSURE OF INVENTION

In order to solve the foregoing problems the present invention providesthe following means.

That is, the present invention provides an image recognition apparatusfor recognizing movements of players matched against each other betweendomains partitioned with such an obstacle as net in a sport match orgame from contents including a television program being telecasted toshow the sport match or game, an image material in an uncompleted statefor broadcasting and contents recorded in such a recording medium as aVTR, the image recognition apparatus comprising: an picture informationobtaining section configured to obtain picture information containing animage of a movement of at least one of the players playing in the sportmatch or game from the contents; a sound information obtaining sectionconfigured to obtain sound information generated in synchronism with thepicture information from the contents, the sound information includinginformation on a hitting sound generated upon hitting of such aninstrument as a ball moving between the domains to serve as an object ofscore count in the sport match or game; a hitting time informationspecifying section configured to specify a hitting time at which theinstrument is hit based on the sound information obtained by the soundinformation obtaining section; a rule information storage sectionconfigured to store rule information for carrying out the sport match orgame; and an image substance recognizing section configured to recognizea substance of an image containing the image of the movement of theplayer provided by the picture information based on the pictureinformation obtained by the picture information obtaining section, aposition of the instrument at the hitting time specified by the hittingtime specifying section and the rule information stored in the ruleinformation storage section.

With this configuration, even when image recognition based on thepicture information only is difficult; for example, when the position ofthe instrument is difficult to specify due to the instrument overlappedor hidden by a player or such an obstacle as net, the hitting timeinformation specifying section specifies the time of the generation of ahitting sound based on the sound information including information onthe hitting sound obtained by the sound information obtaining sectionand then the image substance recognizing section identifies a movementof a player playing in a sport match or game reliably based on thespecified hitting time, the picture information including the image ofthe player's movement and the rule information for carrying out thesport match or game. Thus, the image recognition apparatus provided bythe present invention is capable of superior image recognition withoutany error in recognizing, for example, a forehand swing, a backhandswing and an overhead swing due to the instrument overlapped or hidden.

Methods of specifying a hitting time include a method wherein when thesound information assumes a value higher than a predetermined level, thehitting time information specifying section specifies as the hittingtime a point in time at which the higher value is assumed.

To eliminate noise contained in the sound information except the hittingsound, it is desirable that the sound information obtaining section beprovided with a filter portion configured to permit sound within apredetermined frequency band to pass therethrough, wherein the soundinformation is information on the sound having passed through the filterportion. To advantageously eliminate environmental sound including asound generated when the shoes of a player rub the court during play, asound of wind and other noises, it is desirable that the filter portioncomprise a band-pass filter.

To specify the hitting time more efficiently, it is preferable that thehitting time information specifying section is configured to specify thehitting time based on hitting sound prospect data including data on apredetermined time period within which the hitting sound extracted fromthe sound information is generated.

To extract the hitting time reliably, the hitting time informationspecifying section may be configured to extract plural hitting soundprospect data items from the sound information in such a manner that ahitting sound prospect data item generated at one point in time and asubsequent hitting sound prospect data item generated at a succeedingpoint in time share data on a same time and then specify the hittingtime based on the plural hitting sound prospect data items. In thiscase, if the plural hitting sound prospect data items have equal datalength while the hitting time information specifying section isconfigured to extract the plural hitting sound prospect data items fromthe sound information at constant time intervals, the hitting sound canbe extracted efficiently.

To extract the time of the generation of the hitting sound morereliably, it is desirable that the image recognition apparatus furthercomprise a hitting sound pattern information storage section configuredto store hitting sound pattern information including information onpatterns of sound changes that occur depending on how the instrument ishit by such an instrument as a racket constantly held and used by eachof the players, wherein the hitting time information specifying sectionis configured to specify the hitting time based on the hitting soundpattern information stored in the hitting sound pattern informationstorage section and the sound information.

To extract a characteristic movement of each player from the contents,it is preferable that the picture information obtaining section includesa domain element extracting section configured to extract from thepicture information facility information including information on theobstacle, information on the domains and information on boundary linesbetween each of the domains and an area outside the domain, player'sposition information indicative of a player's position, and instrumentinformation on the instrument moving between the domains to serve as anobject of score count in the sport match or game.

To extract players' characteristic movements from the contents moreefficiently, it is desirable that the player's position information beposition information on a domain containing each of the players and theinstrument constantly held and used by the player.

In a specific embodiment of the present invention for extracting theplayer's position information from the picture information, the domainelement extracting section is configured to extract the player'sposition information from the picture information based on the facilityinformation extracted by the domain element extracting section. In aspecific embodiment of the present invention for extracting theinstrument information from the picture information, the domain elementextracting section is configured to extract the instrument informationfrom the picture information based on the facility information and theplayer's position information extracted by the domain element extractingsection.

To extract a contents element related to a sport of concern suitably, itis desirable that the facility information, the player's positioninformation, the instrument information and the rule information bebased on knowledge about a sport as a subject for image extraction.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram showing the device configuration of an imagerecognition apparatus according to an embodiment of the presentinvention.

FIG. 2 is a function block diagram of the embodiment.

FIG. 3 is a diagram illustrating a court model for use in extractingcourt lines from picture information according to the embodiment.

FIG. 4 is a diagram illustrating a net model for use in extracting netlines from picture information according to the embodiment.

FIG. 5 is a diagram illustrating the court lines and net lines extractedfrom picture information according to the embodiment.

FIG. 6 is an illustration of a player's domain detected according to theembodiment.

FIG. 7 is an illustration of a ball domain detected according to theembodiment.

FIG. 8 is an illustration of a trace of a ball position.

FIG. 9 is an illustration of a manner of storage by a rule informationstorage section of the embodiment.

FIG. 10 is an illustration of a manner of identifying player's movementsaccording to the embodiment.

FIG. 11 is a flowchart showing a process of image recognition frompicture information according to the embodiment.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, one embodiment of the present invention will be describedwith reference to the drawings.

FIG. 1 is a diagram showing the device configuration of an imagerecognition apparatus according to an embodiment of the presentinvention. FIG. 2 is a function block diagram of the embodiment.

The image recognition apparatus according to this embodiment isconfigured to recognize characteristic movements of players playing in asport match or game from sports contents including a television programbeing telecasted on a television receiver TV or being reproduced by arecording/reproducing device such as a VTR, and such contents asrecorded in a recording medium. As shown in FIG. 1, the imagerecognition apparatus includes, as major components thereof, aninput-output interface 11 connected to the television receiver TV andthe recording/reproducing device such as a VTR, an external storagedevice 12 and internal memory 13, such as HDD or the like, for storingdata, programs and the like, a CPU 14 configured to operate according toa program stored in the external storage device 12 or the like to causethe apparatus to function as image recognition apparatus 1, a userinterface 15 comprising a keyboard and a mouse for receiving userinformation about the user, and a like component. The “contents”, asused herein, is meant to include pictures including images of movementsof players, a shot taken at such an angle to view a court from obliquelyabove along the length of the court and a close-up shot of a judge or aspectator, and sound including voice of a commentator and the like. Inthis embodiment, reference is made to a tennis program as an exemplaryone of the “contents”.

In a functional aspect, the image recognition apparatus 1 has functionsas a domain element extracting section 101, a rule information storagesection 102, a sound information obtaining section 103, a hitting soundpattern information storage section 104, a hitting time informationspecifying section 105, an image substance recognizing section 106, anda like section, as shown in FIG. 2, which functions are fulfilled by theoperations of the CPU 14 and the like.

These sections will be described in detail.

The domain element extracting section 101 is configured to extract frompicture information provided by a television receiver facilityinformation including information on such an obstacle as net,information on a court as partitioned domains, and information on courtlines as boundary lines between the court and an area outside the court,player's position information indicative of the position of each player,and instrument information on an instrument moving between half-courtsto serve as an object of score count in a sport match or game ofconcern. The domain element extracting section 101 is designed tofulfill a part of the function of a picture information obtainingsection configured to obtain the picture information containing imagesof movements of at least one player playing in the sport match or gamefrom the contents. In this embodiment, the facility information to beextracted comprises information on the court lines and information onthe net lines; the player's position information to be extractedcomprises position information on each of players 1 and 2 matchedagainst each other; and the instrument information to be extractedcomprises information on a tennis ball (hereinafter will be referred toas “ball”). The facility information, player's position information andinstrument information extracted by the domain element extractingsection will be generally referred to as domain elements.

More specifically, in extracting the facility information, informationon the court lines and information on the net lines are extracted inthis order from the picture information by reference to a court modelspecifying court characteristic points Pc₁, . . . , Pc₁₄ (hereinafterwill be generally referred to as “Pc”) as representative points on thecourt lines and court lines Lc₁, . . . , Lc₉ (hereinafter will begenerally referred to as “Lc”) as shown in FIG. 3 and a net modelspecifying net characteristic points Pn₁, . . . , Pn₃ (hereinafter willbe generally referred to as “Pn”) as representative points on the netlines and net lines Ln₁ and Ln₂ (hereinafter will be generally referredto as “Ln”) as shown in FIG. 4.

First, the court lines are extracted from the picture information bydetecting the court characteristic points. More specifically, at a pointin time t=0, initial characteristic points Pc(0) are given as inputs;each of court lines Lc(0) determined by the characteristic points Pc(0)are transformed into a Hough plane; and then a detection window Wc(0)having dimensions W_(th) and W_(ro) is provided about each peak point onthe Hough plane. At a point in time t=t, first, a binary image B(t) ofan original image and an area around court lines Lc(t-1) are ANDed togenerate a binary image Bc(t) comprising only the area around the court(hereinafter will be referred to as “court line binary image”).Subsequently, the process steps of: subjecting this binary image toHough transformation line by line; performing peak detection within therange limited by each detection window Wc(t−1); updating the courtcharacteristic points Pc(t); subjecting court lines Lc(t) to Houghtransformation again; and updating detection windows Wc(t), areperformed to extract the court lines from the picture information. If acertain court characteristic point is positioned out of the screen dueto panning or the like, update is achieved by estimating the position ofthe point outside the screen based on connecting knowledge on theassumption that court characteristic points Pc_(i)(t) (i=9, 10, 12, 13,or 10, 11, 13, 14) in a central area of the court are constantlydisplayed on the screen. For the same reason, some of initialcharacteristic points may be omitted. The “connecting knowledge” isknowledge defined based on such knowledge used in doing a sport ofconcern that connecting court characteristic points Pc_(i)(t) (i=9, 10,12, 13) for example with each other in the central area of the courtallows a zone that can have a meaning to be defined on the court model.

Subsequently, the net lines are extracted from the picture informationby the following process steps: at a point in time t=0, initialcharacteristic points Pn(0) are given as inputs; a net line Ln(0) and adetection window Wn(0) are provided for each line in the same manner aswith the court lines; at a point in time t=t, an image Bn(t)=B(t)−Bc(t),which is a binary image formed by removing the court line binary imagefrom the binary image of the original image, is generated as a net linebinary image; this binary image is then subjected to Houghtransformation; peak detection is performed within each detectionwindow; and the characteristic points Pn(t) are updated.

In this way the court lines and the net lines can be extracted as shownin FIG. 5.

In turn, the player's position information is extracted by specifying adomain in which overlapping is maximum in binary images formed byremoving the court lines and the net lines from the picture image.

More specifically, at a point in time t=t, differences from images thatare forwardly and backwardly apart from an image of concern by s framesare found to generate binary images B₁(t) and B₂(t) using appropriatethreshold values. Here, B₁(t)=BIN(I(t)−I(t−s)), andB₂(t)=BIN(I(t+s)−I(t)), wherein BIN is a function making theparenthesized factor binary. Based on a binary image B_(diff)(t)resulting from an AND operation on these two difference images and abinary image B_(label)(t) in which those points on an image I(t) at apoint in time t=t which are included in a color cluster corresponding toa predetermined representative color of, for example, a players' uniformare each defined as 1, the court lines and the net lines are erased.Further, a domain from which a portion overlapping the player's domainis considered to have been removed is compensated for throughexpansion/compression processing. The two images thus obtained are ORedto give a binary image B(t) as shown in FIG. 6. A connected domainwithin the binary image B(t) thus obtained is labeled and the thuslabeled domain is observed throughout several frames to avoid influenceof noise. Such a domain in the area covering the court and the areatherearound is determined as a player's initial position if the domainhas an area larger than a predetermined value. Of such domains eachhaving an area larger than the predetermined value at the point in timet=t, those domains each of which is located adjacent a player's domainat a point in time t=t−1 and has the smallest difference in area fromthe latter player's domain is judged as a player's domain p, therebyproviding player's position information.

By switching between a detection mode and a trace mode in accordancewith the distance from the player's position given by the player'sposition information thus extracted, the ball is extracted.

More specifically, the detection mode is a mode for detecting all ballprospect positions each matching a predetermined template T_(b)(x,y) ina domain around each player within an image I′_(B) from which theplayers' positions P have been erased at the point in time t with use ofthe template T_(b)(x,y) provided with a ball size of b_(x) ^(x)b_(y), asshown in FIG. 7. Likewise, ball prospects at points of time t=t+1, t+2,. . . are detected and series of ball prospects Ba which are detected tobe radially consecutive from about a player's position are chosen andthe number of such series of ball prospects Ba is reduced by selectionto find a single series of ball prospects Ba. The finally selectedseries of ball prospects Ba can be specified as a ball trajectory BWwithin a time segment of concern. The template T_(b)(x,y) is a kind oftool provided for extracting the ball from the picture information. Inthis embodiment the size of the ball to be displayed as expanded orcompressed is provisionally established as b_(x) ^(x)b_(y) and aperiphery slightly expanded outwardly from b_(x) ^(x)b_(y) isestablished as the template.

The trace mode is a mode for tracing the ball trajectory BW by templatematching with the template T_(b)(x,y). In this mode, tracing isconducted using as the center of estimation a position obtained byadding an amount of move detected last time directly to a current frameon the assumption that ball trajectory BW within a very short period oftime can be considered to be substantially straight. When the distancebetween a player's domain and the position of a ball prospect Ba becomessmaller than a certain threshold value, the trace mode is switched tothe detection mode. If not, the trace mode operation is repeatedlyconducted.

In this way, the ball trajectory BW within a desired time segment can beobtained as shown in FIG. 8. Note that the ball trajectory BW issuperimposed on picture information obtained at a desired point in timein FIG. 8 for convenience in showing the ball trajectory BW.

The rule information storage section 102 is configured to store ruleinformation required for carrying out a sport of concern and is providedin a predetermined area of the external storage device 12 or internalmemory 13. More specifically, as shown in FIG. 9, the rule informationincludes rule information items defining respective rule informationindexes including, for example, a rule information index “service”defined by the description that “the server stands rearwardly of thebase line away from the net with his or her both feet on the groundbetween imaginary extensions of respective of the center mark and a sideline. The server throws a ball up into the air in any direction and thenhits the ball before falling to the ground with the racket. The serviceis considered to have been completed at the moment the racket and theball contact each other.”, and a rule information index “fall of theball on a court line” defined by the description that “the ball havingfallen on a court line is considered to have fallen to the ground withinthe court delimited by the court line.

The sound information obtaining section 103 is configured to obtainsound information containing a hitting sound generated upon hitting ofthe ball and like sound from the contents by sampling the soundinformation at a resolving power of 16 bits and a sampling grade of 44.1kHz. In this embodiment, the sound information obtaining section 103 isprovided with a filter portion not shown for advantageously extractingonly the hitting sound by filtering off sound information other than thehitting sound including a sound generated when the shoes of a player rubthe court during play, sound of wind and other noises. Morespecifically, the filter portion comprises a band-pass filter forpermitting sound within a predetermined frequency band to passtherethrough, the band-pass filter comprising a digital circuit such asa FIR filter, IIR filter or the like. In this embodiment, the band-passfilter is configured to permit signal components within a frequency bandof 100 to 1500 Hz to pass therethrough.

The hitting sound pattern information storage section 104 is configuredto store information on patterns of sound changes that occur dependingon how the instrument is hit by a racket which are categorized accordingto hitting sounds generated by different sorts of strokes such as asmash and a forehand stroke, as hitting sound pattern information byconnecting each of the patterns with a predetermined frequency and anamplitude at this frequency. The hitting sound pattern informationstorage section 104 is provided in a predetermined area of the externalstorage device 12 or internal memory 13. The hitting sound patterninformation storage section 104 may be configured to store patterns ofsound other than the sound generated upon hitting of a ball with aracket, for example, sound generated upon a bounce of a ball on thecourt.

The hitting time information specifying section 105 is configured tospecify a hitting time based on the hitting sound pattern informationstored in the hitting sound pattern information storage section 104 andthe sound information obtained by the sound information obtainingsection 103.

More specifically, the hitting time information specifying section 105performs FFT processing on the sound information obtained by the soundinformation obtaining section 103 with its start time being shifted on a2048 point (≈0.046 sec) basis at intervals of 128 points (≈0.029 sec)and checks a frequency characteristic pattern of a sound informationitem converted to a frequency region at each point in time againsthitting sound pattern information items stored in the hitting soundpattern information storage section 104. If the frequency characteristicpattern of the sound information item is found to match with a hittingsound pattern information item as a result of the checking, the hittingtime information specifying section 105 specifies as ball hitting timet_(a) the point in time at which the sound information item having thefrequency characteristic pattern matching with the hitting sound patterninformation item is generated and then outputs the hitting time t_(a)thus specified to the image substance recognizing section 106. In thisembodiment the hitting time information specifying section 105 isdesigned to use a correlation function in checking the matching betweenthe frequency characteristic pattern of a sound information item and ahitting sound pattern information item, and if the correlation functionis larger than a predetermined threshold value, the frequencycharacteristic pattern of the sound information item and the hittingsound pattern information item are considered to match with each other.

The image substance recognizing section 106 is configured to recognizethe substance of an image containing a player's movement provided by thepicture information based on the court lines and net lines and player'sposition information extracted by the domain element extracting section101, the position of the instrument at the hitting time t_(a) specifiedby the hitting time information specifying section 105 and the ruleinformation stored in the rule information storage section 102.

More specifically, as shown in FIG. 10, ball position P_(i)(t_(a)) atspecified hitting time (t_(a)) is determined by estimating anappropriate trajectory from the last detected ball position or N pointsfollowing the last detected ball position. From the ball positionP_(i)(t_(a)) thus determined and a player's position, a player'smovement is identified. For example, if the ball is above anidentification line extending through an upper portion of a rectanglecircumscribing a player hitting the ball at the hitting time t_(a), themovement of the player is identified as “overhead_swing”, while if theball is on the foreside or backside with respect to the center ofgravity of the player, the movement of the player is identified as“forehand_swing” or “backhand_swing”. The identification line isestablished to extend through an upper portion of a player's domaindetermined by a fixed proportion to the vertical length of the player'scircumscribing rectangle.

Next, the operation of the image recognition apparatus according to thisembodiment will be described with reference to the flowchart at FIG. 11.

Initially, court lines and net lines are extracted from pictureinformation containing images of movements of the players during play(step S101). Player's position information is extracted from the pictureinformation using a binary image formed by removing the court lines andthe net lines from the picture information (step S102). Based on theplayer's position information thus extracted, a ball is extracted fromthe picture information (step S103). Subsequently, sound informationcontaining a hitting sound generated upon hitting of the ball isobtained by filtering the sound information with the filter portion(step S104). The FFT processing is performed on the filtered soundinformation thus obtained with the start time being shifted atpredetermined intervals (step S105). The frequency characteristicpattern of a hitting sound prospect data item obtained at each point intime by transforming a sound information item to a frequency region bythe FFT processing is checked against hitting sound pattern informationitems stored in the hitting sound pattern information storage section104 (step S106). If the frequency characteristic pattern of the hittingsound prospect data item is found to match with a hitting sound patterninformation item according to the result of the checking (step S107),the point in time at which the hitting sound prospect data item havingthe frequency characteristic pattern matching with the hitting soundpattern information item is generated is specified as ball hitting timet_(a) (step S108). If the frequency characteristic pattern of thehitting sound prospect data item is found not to match with the hittingsound pattern information item (step S107), the frequency characteristicpattern of a hitting sound prospect data item generated at the nextpoint in time is checked against the hitting sound pattern informationitems (step S106). Based on the ball position and player's position atthe specified hitting time and the rule information, three movements,i.e., “forehand_swing” indicative of a forehand swing motion,“backhand_swing” indicative of a backhand swing motion and“overhead_swing” indicative of an overhead swing motion, can berecognized as shown in FIG. 10 even when inconveniences occur in imagerecognition, for example, such an inconvenience that the ball isoverlapped or hidden by a player (step S109).

As described above, even when image recognition based on the pictureinformation only is difficult; for example, when the position of theinstrument is difficult to specify due to the instrument overlapped orhidden by a player or such an obstacle as net in a picture, the hittingtime information specifying section specifies the hitting time at whicha hitting sound is generated based on the sound information includinginformation on the hitting sound obtained by the sound informationobtaining section and then the image substance recognizing sectionidentifies a player's movement reliably based on the specified hittingtime, the picture information containing the image of the player'smovement during play and the rule information for carrying out the rulesof the sport match or game. Thus, it is possible to provide a relativelyinexpensive image recognition apparatus which is excellent in imagerecognition ability and which is capable of preventing such arecognition error as has been impossible to prevent in image recognitionbased the picture information only, for example, errors in identifying aforehand swing, a backhand swing and an overhead swing due to theinstrument overlapped or hidden. It is needless to say that the imagerecognition apparatus is capable of advantageous image recognition evenwhen the ball and a player are not overlapped or hidden by each other.

Even though the obtained sound information contains noise other thanhitting sound, the filter portion is capable of filtering off suchnoise. For this reason, robust image recognition with a high recognitionrate is possible.

Since the hitting time information specifying section is configured toobtain plural hitting sound prospect data items from the soundinformation and specify a hitting time based on these hitting soundprospect data items, the hitting time can be specified exactly. Further,the hitting time information specifying section is configured to obtainthe plural hitting sound prospect data items in such a manner that ahitting sound prospect data item generated at one point in time andanother hitting sound prospect data item generated at an immediatelypreceding or succeeding point in time share data on a same time.Accordingly, it is possible to obviate a failure to specify a hittingtime.

In this embodiment, a tennis program is used as an exemplary one of thecontents, while facility information as a domain element to be extractedfrom the picture information on the tennis program includes informationon court lines and information on net lines. It is, however, needless tosay that if the contents change from the tennis program to another sportprogram or the like, the facility information to be extracted changesalso. Similarly, the player's position information and the instrumentinformation also change.

This embodiment is configured to recognize characteristic movements ofplayers playing in a sport match or game from sports contents includinga television program being telecasted on a television receiver TV orbeing reproduced by a recording/reproducing device such as a VTR, andsuch contents as recorded in a recording medium. However, media throughwhich contents as a subject for image recognition are provided are notlimited to those used in this embodiment. For example, it is possible torecognize characteristic movements of players playing in a sport matchor game from image materials which have been just taken from the sportmatch or game at a stadium and hence are in an uncompleted state forbroadcasting or from archived picture information on Internet.

While the image substance recognizing section 106 is configured torecognize the three movements, i.e., “forehand_swing” indicative of aforehand swing motion, “backhand_swing” indicative of a backhand swingmotion and “overhead_swing” indicative of an overhead swing motion asthe substance of an image containing a player's movement provided by thepicture information, the image substance recognizing section 106 may beconfigured to recognize “stay” indicative of a staying movement of aplayer on the spot and “move” indicative of a move of a player based onthe relation between the ball position and the player's position or likerelation. If the rule information to be stored in the rule informationstorage section 102 includes more complicated definitions includingdefinitions of various player's movements, the image substancerecognizing section 106 will be capable of recognizing more complicatedplayer's movements.

While this embodiment is configured to extract a ball from pictureinformation using the predetermined template T_(b)(x,y) having a ballsize of B_(x)×B_(y), the ball may be extracted without using thetemplate.

While the sound information obtaining section 103 is provided with thefilter portion comprising a band-pass filter, an embodiment of soundinformation obtaining section 103 employing a filter other than theband-pass filter is possible. Further, there is no limitation to thefrequency band of 100 to 1500 Hz to be permitted to pass through thefilter portion.

The sound information obtaining section 103 is configured to obtainsound information containing a hitting sound generated upon hitting ofthe ball and like sound from the contents by sampling the soundinformation at a resolving power of 16 bits and a sampling grade of 44.1kHz. However, there is no particular limitation to these values ofrespective of the resolving power and the sampling grade.

In this embodiment the hitting time information specifying section 105is configured to perform FFT processing on the sound informationobtained by the sound information obtaining section 103 with its starttime being shifted on a 2048 point (≈0.046 sec) basis at intervals of128 points (≈0.029 sec). However, the numbers of such points for use inthe FFT processing may vary without limitation to these numbers.

In this embodiment the hitting time information specifying section 105is designed to use a correlation function in checking the matchingbetween the frequency characteristic pattern of a sound information itemand a hitting sound pattern information item, and if the correlationfunction is larger than the predetermined threshold value, the frequencycharacteristic pattern of the sound information item and the hittingsound pattern information item are considered to match with each other.It is, however, possible to employ other methods of checking thematching between the frequency characteristic pattern of a soundinformation item and a hitting sound pattern information item.

The specific features of other sections or parts are not limited to thisembodiment but may be modified variously without departing from theconcept of the present invention.

INDUSTRIAL APPLICABILITY

According to the present invention having been described above, evenwhen image recognition based on picture information only is difficult;for example, when the position of an instrument used in a sport match orgame is difficult to specify due to the instrument overlapped or hiddenby a player or by such an obstacle as net, the hitting time informationspecifying section specifies the hitting time at which a hitting soundis generated based on sound information including information on thehitting sound obtained by the sound information obtaining section andthen the image substance recognizing section identifies a player'smovement reliably based on the specified hitting time, the pictureinformation containing the image of the player's movement during playand the rule information for carrying out the rules of the sport matchor game. Thus, it is possible to provide a relatively inexpensive imagerecognition apparatus which is excellent in image recognition abilityand which is capable of preventing such a recognition error as has beenimpossible to prevent in image recognition based on the pictureinformation only, for example, errors in identifying a forehand swing, abackhand swing and an overhead swing due to the instrument overlapped orhidden.

1. An image recognition apparatus for recognizing movements of playersmatched against each other between domains partitioned with such anobstacle as net in a sport match or game from contents including atelevision program being telecasted to show the sport match or game, animage material in an uncompleted state for broadcasting and contentsrecorded in such a recording medium as a VTR, the image recognitionapparatus comprising: a picture information obtaining section configuredto obtain picture information containing an image of a movement of atleast one of the players playing in the sport match or game from thecontents; a sound information obtaining section configured to obtainsound information generated in synchronism with the picture informationfrom the contents, the sound information including information on ahitting sound generated upon hitting of such an instrument as a ballmoving between the domains to serve as an object of score count in thesport match or game; a hitting time information specifying sectionconfigured to specify a hitting time at which the instrument is hitbased on the sound information obtained by the sound informationobtaining section; a rule information storage section configured tostore rule information for carrying out the sport match or game; and animage substance recognizing section configured to recognize a substanceof an image containing the image of the movement of the player providedby the picture information based on the picture information obtained bythe picture information obtaining section, a position of the instrumentat the hitting time specified by the hitting time specifying section andthe rule information stored in the rule information storage section. 2.The image recognition apparatus according to claim 1, wherein when thesound information assumes a value higher than a predetermined level, thehitting time information specifying section specifies as the hittingtime a point in time at which the higher value is assumed.
 3. The imagerecognition apparatus according to claim 1, wherein: the soundinformation obtaining section is provided with a filter portionconfigured to permit sound within a predetermined frequency band to passtherethrough; and the sound information is information on the soundhaving passed through the filter portion.
 4. The image recognitionapparatus according to claim 3, wherein the filter portion comprises aband-pass filter.
 5. The image recognition apparatus according to claim1, wherein the hitting time information specifying section is configuredto specify the hitting time based on hitting sound prospect dataincluding data on a predetermined time period within which the hittingsound extracted from the sound information is generated.
 6. The imagerecognition apparatus according to claim 1, wherein the hitting timeinformation specifying section is configured to extract plural hittingsound prospect data items from the sound information in such a mannerthat a hitting sound prospect data item generated at one point in timeand a subsequent hitting sound prospect data item generated at asucceeding point in time share data on a same time and then specify thehitting time based on the plural hitting sound prospect data items. 7.The image recognition apparatus according to claim 6, wherein the pluralhitting sound prospect data items have equal data length, while thehitting time information specifying section is configured to extract theplural hitting sound prospect data items from the sound information atconstant time intervals.
 8. The image recognition apparatus according toclaim 1, further comprising a hitting sound pattern information storagesection configured to store hitting sound pattern information includinginformation on patterns of sound changes that occur depending on how theinstrument is hit by such an instrument as a racket constantly held andused by each of the players, wherein the hitting time informationspecifying section is configured to specify the hitting time based onthe hitting sound pattern information stored in the hitting soundpattern information storage section and the sound information.
 9. Theimage recognition apparatus according to claim 1, wherein the pictureinformation obtaining section includes a domain element extractingsection configured to extract from the picture information facilityinformation including information on the obstacle, information on thedomains and information on boundary lines between each of the domainsand an area outside the domain, player's position information indicativeof a player's position, and instrument information on the instrumentmoving between the domains to serve as an object of score count in thesport match or game.
 10. The image recognition apparatus according toclaim 9, wherein the player's position information is positioninformation on a domain containing each of the players and theinstrument constantly held and used by the player.
 11. The imagerecognition apparatus according to claim 10, wherein the domain elementextracting section is configured to extract the player's positioninformation from the picture information based on the facilityinformation extracted by the domain element extracting section.
 12. Theimage recognition apparatus according to claim 9, wherein the domainelement extracting section is configured to extract the instrumentinformation from the picture information based on the facilityinformation and the player's position information extracted by thedomain element extracting section.
 13. The image recognition apparatusaccording to claim 9, wherein the facility information, the player'sposition information, the instrument information and the ruleinformation are based on knowledge about a sport as a subject for imageextraction.
 14. An image recognition program cooperative with a computerfor causing an image recognition apparatus to operate to recognizemovements of players matched against each other between domainspartitioned with such an obstacle as net in a sport match or game fromcontents including a television program being telecasted to show thesport match or game, an image material in an uncompleted state forbroadcasting and contents recorded in such a recording medium as a VTR,the image recognition program being configured to cause the imagerecognition apparatus to function as: a picture information obtainingsection configured to obtain picture information containing an image ofa movement of at least one of the players playing in the sport match orgame from the contents; a sound information obtaining section configuredto obtain sound information generated in synchronism with the pictureinformation from the contents, the sound information includinginformation on a hitting sound generated upon hitting of such aninstrument as a ball moving between the domains to serve as an object ofscore count in the sport match or game; a hitting time informationspecifying section configured to specify a hitting time at which theinstrument is hit based on the sound information obtained by the soundinformation obtaining section; a rule information storage sectionconfigured to store rule information for carrying out the sport match orgame; and an image substance recognizing section configured to recognizea substance of an image containing the image of the movement of theplayer provided by the picture information based on the pictureinformation obtained by the picture information obtaining section, aposition of the instrument at the hitting time specified by the hittingtime specifying section and the rule information stored in the ruleinformation storage section.